Feature Reconstruction Disentangling for Pose-invariant Face Recognition Supplementary Material

نویسندگان

  • Xi Peng
  • Xiang Yu
  • Kihyuk Sohn
  • Dimitris N. Metaxas
  • Manmohan Chandraker
چکیده

Pose-variant face generation We designed a network to predict 3DMM parameters from a single face image. The design is mainly based on VGG16 [4]. We use the same number of convolutional layers as VGG16 but replacing all max pooling layers with stride-2 convolutional operations. The fully connected (fc) layers are also different: we first use two fc layers, each of which has 1024 neurons, to connect with the convolutional modules; then, a fc layer of 30 neurons is used for identity parameters, a fc layer of 29 neurons is used for expression parameters, and a fc layer of 7 neurons is used for pose parameters. Different from [8] uses 199 parameters to represent the identity coefficients, we truncate the number of identity eigenvectors to 30 which preserves 90% of variations. This truncation leads to fast convergence and less overfitting. For texture, we only generate non-frontal faces from frontal ones, which significantly mitigate the hallucinating texture issue caused by self occlusion and guarantee high-fidelity reconstruction. We apply the Z-Buffer algorithm used in [8] to prevent ambiguous pixel intensities due to same image plane position but different depths. Rich feature embedding The design of the rich embedding network is mainly based on the architecture of CASIA-net [6] since it is wildly used in former approach and achieves strong performance in face recognition. During training, CASIA+MultiPIE or CASIA+300WLP are used. As shown in Figure 3 of the main submission, after the convolutional layers of CASIA-net, we use a 512-d FC for the rich feature embedding, which is further branched into a 256-d identity feature and a 128-d non-identity feature. The 128-d non-identity feature is further connected with a 136-d landmark prediction and a 7-d pose prediction. Notice that in the face generation network, the number of pose parameters is 7 instead of 3 because we need to uniquely depict the projection matrix from the 3D model and the 2D face shape in image domain, which includes scale, pitch, yaw, roll, x translation, y translation, and z translations. Disentanglement by feature reconstruction Once the rich embedding network is trained, we feed genius pair that share the same identity but different viewpoints into the network to obtain the corresponding rich embedding, identity and non-identity features. To disentangle the identity and pose factors, we concatenate the identity and non-identity features and roll though two 512-d fully connected layers to output a reconstructed rich embedding depicted by 512 neurons. Both self and cross reconstruction loss are designed to eventually push the two identity features close to each other. At the same time, a cross-entropy loss is applied on the near-frontal identity feature to maintain the discriminative power of the learned representation. The disentanglement of the identity and pose is finally achieved by the proposed feature reconstruction based metric learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Research on Feature Matching of Multi-pose Face Based on SIFT

Feature matching based on multi-pose faces has become more important in recent years. And it can be used in many fields, such as video monitoring, identity recognition, and so on. In this paper, SIFT algorithm is combined with AdaBoost algorithm, and a method of feature matching based on a multi-pose face is established. Firstly, the face region is extracted from multi-pose face images by AdaBo...

متن کامل

3D Model Based Pose Invariant Face Recognition from a Single Frontal View

This paper proposes a 3D model based pose invariant face recognition method that can recognize a face of a large rotation angle from its single nearly frontal view. The proposed method achieves the goal by using an analytic-to-holistic approach and a novel algorithm for estimation of ear points. Firstly, the proposed method achieves facial feature detection, in which an edge map based algorithm...

متن کامل

Pose Invariant Face Recognition Under Arbitrary Illumination Based on 3D Face Reconstruction

Pose and illumination changes from picture to picture are two main barriers toward full automatic face recognition. In this paper, a novel method to handle both pose and lighting condition simultaneously is proposed, which calibrates the pose and lighting condition to a pre-set reference condition through an illumination invariant 3D face reconstruction. First, some located facial landmarks and...

متن کامل

Pose - Invariant Multimodal ( 2 D + 3 D ) Face Recognition using Geodesic Distance Map

In this paper, an efficient pose-invariant face recognition method is proposed. This method is multimodal means that it uses 2D (color) and 3D (depth) information of a face for recognition. In the first step, the geodesic distances of all face points from a reference point are computed. Then, the face points are mapped from the 3D space to a new 2D space. The proposed mapping is robust under th...

متن کامل

Learning to Disentangle Factors of Variation with Manifold Interaction

Many latent factors of variation interact to generate sensory data; for example, pose, morphology and expression in face images. In this work, we propose to learn manifold coordinates for the relevant factors of variation and to model their joint interaction. Many existing feature learning algorithms focus on a single task and extract features that are sensitive to the task-relevant factors and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017